Distributed Human Computation Framework for Linked Data Co-reference Resolution

نویسندگان

  • Yang Yang
  • Priyanka Singh
  • Jiadi Yao
  • Ching-man Au Yeung
  • Amir Zareian
  • Xiaowei Wang
  • Zhonglun Cai
  • Manuel Salvadores
  • Nicholas Gibbins
  • Wendy Hall
  • Nigel Shadbolt
چکیده

Distributed Human Computation (DHC) is used to solve computational problems by incorporating the collaborative effort of a large number of humans. It is also a solution to AI-complete problems such as natural language processing. The Semantic Web with its root in AI has many research problems that are considered as AI-complete. E.g. co-reference resolution, which involves determining whether different URIs refer to the same entity, is a significant hurdle to overcome in the realisation of large-scale Semantic Web applications. In this paper, we propose a framework for building a DHC system on top of the Linked Data Cloud to solve various computational problems. To demonstrate the concept, we are focusing on handling the co-reference resolution when integrating distributed datasets. Traditionally machine-learning algorithms are used as a solution for this but they are often computationally expensive, error-prone and do not scale. We designed a DHC system named iamResearcher, which solves the scientific publication author identity coreference problem when integrating distributed bibliographic datasets. In our system, we aggregated 6 million bibliographic data from various publication repositories. Users can sign up to the system to audit and align their own publications, thus solving the co-reference problem in a distributed manner. The aggregated results are dereferenceable in the Open Linked Data Cloud.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing Identity Co-Reference Across Drug Discovery Datasets

This paper presents the rules used within the Open PHACTS (http://www.openphacts.org) Identity Management Service to compute co-reference chains across multiple datasets. The web of (linked) data has encouraged a proliferation of identifiers for the concepts captured in datasets; with each dataset using their own identifier. A key data integration challenge is linking the co-referent identifier...

متن کامل

Managing Co-reference on the Semantic Web

Co-reference resolution, or the determination of ‘equivalent’ URIs referring to the same concept or entity, is a significant hurdle to overcome in the realisation of large scale Semantic Web applications. However, it has only recently gained the attention of research communities in the Semantic Web context, and while activities are now underway in identifying co-referent or conflated URIs, litt...

متن کامل

A rule based solution to co-reference resolution in clinical text

OBJECTIVE To build an effective co-reference resolution system tailored to the biomedical domain. METHODS Experimental materials used in this study were provided by the 2011 i2b2 Natural Language Processing Challenge. The 2011 i2b2 challenge involves co-reference resolution in medical documents. Concept mentions have been annotated in clinical texts, and the mentions that co-refer in each doc...

متن کامل

Five Stars of Linked Data Vocabulary Use Editorial

In 2010 Tim Berners-Lee introduced a 5 star rating to his Linked Data design issues page to encourage data publishers along the road to good Linked Data. What makes the star rating so effective is its simplicity, clarity, and a pinch of psychology – is your data 5 star? While there is an abundance of 5 star Linked Data available today, finding, querying, and integrating/interlinking these data ...

متن کامل

Parallelizing Irregular Applications through the YAPPA Compilation Framework

Modern High Performance Computing (HPC) clusters are composed of hundred of nodes integrating multicore processors with advanced cache hierarchies. These systems can reach several petaflops of peak performance, but are optimized for floating point intensive applications, and regular, localizable data structures. The network interconnection of these systems is optimized for bulk, synchronous tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011